1 Overview

This session will give hands-on feel on conducting a bioinformatics workshop.

In the next 2 hours, we will cover the following secions ;

  1. Introduction to Binformatics
  2. Introduction to R
  3. RNA-sequencing analysis

2 The question that we will answer

What are the differentially expressed genes between normal cells and cancer cells?

Cancer is a disease in which some of the body’s cells grow uncontrollably and spread to other parts of the body. Cancer is caused by certain changes to genes.

3 Introduction to Cancer Bioinformatics

FIGURE 1 - Overarching concept of Bioinformatics in concept. Figuring out the difference between two systems. What pattern is in one image compared to the other. Image taken from Michael Edwards lecture
FIGURE 1 - Overarching concept of Bioinformatics in concept. Figuring out the difference between two systems. What pattern is in one image compared to the other. Image taken from Michael Edwards lecture

4 Introduction to R and RStudio

What is R

  • R is an open-source language and environment for statistical computing and graphics, widely used by scientists.
  • R is both a computational language and environment for statistical computing, data visualization, data science and machine learning
  • RStudio is an integrated development environment for R and Python
  • Rstudio provides a graphic user interface for working with R
  • In this session, we will showcase an cloud based RStudio Server -
  • User can install R and Rstudio locally on their device

Introduction to RStudio interface

  • Panel towards the top left is the scrip
  • Basic math function

Addition

3 + 3
## [1] 6

Multiplication

3 * 3
## [1] 9

Storing variables in R

num1 <- 5
num2 = 10

num1 + num2
## [1] 15

A more practical example. Lets create a vector stroring multiple values

#create vectors to hold plant heights from each sample
group1 <- c(8, 8, 9, 9, 9, 11, 12, 13, 13, 14)
group2 <- c(22, 23, 24, 24, 25, 26, 27, 20, 26, 28)

Lets get the sum

8 + 8 + 9 + 9 + 9 + 11 + 12 + 13 + 13 + 14
## [1] 106

Now the mean

(8 + 8 + 9 + 9 + 9 + 11 + 12 + 13 + 13 + 14) / 10
## [1] 10.6

Use in built function in R.

Getting the sum

sum(group1)
## [1] 106
mean(group1)
## [1] 10.6

4.1 Visuzalize data

# Calculate means
means <- c(mean(group1), mean(group2))

# Make bar plot
barplot(means, names.arg = c("Group 1", "Group 2"),
        col = c("skyblue", "salmon"),
        main = "Average Plant Height",
        ylab = "Mean Height")

We can show as box plot as well

boxplot(group1, group2,
        names = c("Group 1", "Group 2"),
        col = c("skyblue", "salmon"),
        main = "Boxplot of Plant Heights",
        ylab = "Height")

4.2 Statistical test

Perform t-test. The T-test is performed using the t.test() function, which essentially tests for the difference in means of a variable between two groups.

t.test(group1, group2)
## 
##  Welch Two Sample t-test
## 
## data:  group1 and group2
## t = -13.26, df = 17.932, p-value = 1.048e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16.10295 -11.69705
## sample estimates:
## mean of x mean of y 
##      10.6      24.5

t.test saves a lot of information: the difference in means estimate, confidence interval for the difference conf.int, the p-value p.value, etc.

5 RNA-sequencing analysis